How File Scan Options are Derived
When a node scan is scheduled, the files that are requested to be scanned are derived from both the file scan options of all node groups a node is a member of as well as any files specified in attached policies in file checks. This article describes how a final set of file scan options are derived and sent to the Agent or Connection Manager to perform a scan.
Overview
File scan options are sent to the Connection Manager upon every scan request and they instruct the Connection Manager which files and directories to traverse and collect. In simple cases, a node may only inherit basic scan options of common system paths from those Cloudhouse Guardian (Guardian) prescribes, in which case the resulting files that appear in a node scan should be recognizable. In more complex cases, it may be difficult to determine why a particular file was collected during a node scan, especially if multiple sources are linked to the node group.
Scan options are collected from a number of sources:
-
A node inherits all file scan options from all node groups it is a member of;
-
Sometimes scan options can cancel each other out if paths from one group use an exclusion rule on top of file scan options from another group;
-
For conflicting rules of inclusion and exclusion, a user specified order or precedence may be applied;
-
Policies that are assigned to node groups that contain file-based checks also have these file paths inherited.
This article provides a number of worked examples to demonstrate how a final file scan instruction set is derived and sent to a Connection Manager. This page can be particularly useful if you are designing your scan options and either cannot explain why certain files are being scanned, or if you are hitting a file scan limit.
Example 1: Scan Options from Single Node Group Membership
In this example, we will use a single Linux node. All Linux-based nodes are automatically added to the Linux node group, which contains a number of default file scan options. The following is an adapted subset of the official file scan option defaults, which we will use for this example:
-
R1.
/home/user/app/app.json
-
R2.
/etc/*.conf
-
R3.
/etc/**/*.conf
Here, our example node will inherit these paths from the Linux node group and scan:
-
the specific file at
/home/user/app/app.json
(rule 1) -
any file ending in
.conf
in the/etc
directory (rule 2) -
any file ending in
.conf
in subdirectories of/etc
(rule 3)
For more information about these scan options types, see Scan Options.
Example 2: Scan Options from Multiple Node Groups
More often than not, a node will belong to two or more node groups. For example, an app server node could belong to:
-
The
Linux
node group as it’s base operating system is Linux -
An
App Server
node group as it may have reporting, change detection or policies associated with it as part of a larger app server group of nodes -
A
Production
node group as it may have additional checks that are run on production nodes only.
For this example, assume the following file scan options are defined for each of these node groups.
Linux Node Group
-
R1.
/etc/*.conf
-
R2.
/etc/**/*.conf
App Server Node Group
-
R3.
/home/user/app/app.json
-
R4.
/home/user/app/db.yml
Production Node Group
-
R5.
/home/user/shared/server.key
-
R6.
/home/user/shared/server.crt
-
R7.
/home/user/app/app.json
Here, the following file scan options will be derived for this node’s node scan:
-
any file ending in
.conf
in the/etc
directory (rule 1) -
any file ending in
.conf
in subdirectories of/etc
(rule 2) -
the specific file
/home/user/app/app.json
(rule 3 and rule 7) -
the specific file
/home/user/app/db.yml
(rule 4) -
the specific file
/home/user/shared/server.key
(rule 5) -
the specific file
/home/user/shared/server.crt
(rule 6)
Here you can see that a node inherits the union of all file scan options from all node groups that it belongs to. You will also notice the the specific file at /home/user/app/app.json
is specified twice (once for the App Server
group and once for the Production
group). This entry gets deduplicated during the pre-scan derivation of file scan options.
Example 3: Scan Options from Node Groups with Exclusion Rules and Order of Precedence
In this example, we will use two nodes called AppDev
and AppProd
. Both nodes are members of the Windows
node group as they are both Windows-based nodes. Both nodes are in an additional node group called App Servers
as they are both configured to have similar functions. The AppProd
node is also in a Production
node group, whereas the AppDEv
node is not.
The file scan options for the Node Groups are defined as follows: Windows
node group
-
R1.
C:\Windows\System32\**\*.ini
-
R2.
C:\Windows\System32\*.ini
App Servers Node Group
-
R3.
C:\App\pages\**\*.html
-
R4.
C:\App\config\*
-
R5.
C:\App\pages\src\.subversion\build-hash
Production Node Group
-
R6.
!C:\Windows\System32\secret\keys.ini
-
R7.
!C:\App\config\*.pem
-
R8.
!C:\App\pages\**\.subversion
Here, the AppDev
node will simply inherit the union of the file scan options from the Windows
and App Servers
node groups, as per the same logic in Example 1: Scan Options from Single Node Group Membership.
As for AppProd
, the following file scan options will be derived:
-
Any file in
C:\Windows\System32
and it’s subdirectories that ends in.ini
, except for the specific fileC:\Windows\System32\secret\keys.ini
(rule 1, rule 2, and rule 6) -
Any file in subdirectories of
C:\App\pages
that ends with.html
and is not in any directory or subdirectory that has a directory called.subversion
in its path (rule 3 and rule 8) -
The specific file at
C:\App\pages\src\.subversion\build-hash
(rule 5) -
Any file in the directory
C:\App\config
that does not end in.pem
(rule 4 and rule 7)
Each example above shows how a wider greedy-star or wildcard pattern can be trumped by a more specific inclusion or exclusion rule. Of particular interest is how the second and third derived scan options are constructed from rules 3, 5 and 8. The specific file at C:\App\pages\src\.subversion\build-hash
is included, even with the exclusion rule (rule 8) because absolute paths have precedence over wildcard and greedy-star rules. In this example if you want the file at C:\App\pages\src\.subversion\build-hash
to be excluded given the current rule set, you can define a custom precedence value on rule 8 so that it takes artificial priority over rule 5.
For a more thorough description of the rules, their order or precedence and how to override the default order of operations with a custom priority order, please visit our guide on Order of Precedence of Conflicting Rules. For more information on adding exclusion rules, please visit our guide on File Exclusion via Path Negation.
Example 4: Scan Options from Node Group Membership and Associated Policy
File scan options can also be included from file-based checks defined in policies associated with node groups that a node is a member of. As policy file checks usually define a absolute path to a file, they will generally be included, given the possible existence of other exclusion rules, because they are absolute.
In this example we will define a single node that belongs to a Webserver
node group and this node group has a policy associated with it. The node group has the following file scan options defined:
-
R1.
/etc/**/*.conf
-
R2.
/etc/*.conf
The associated policy has the following checks:
-
C1. The file at
/home/app/shared/server.crt
should exist and have the checksum3ec061bf2752cf35f358b689212dceeb
-
C2. The file at
/home/app/shared/server.key
should exist and have the checksumc0bd1423a970449eb51d18a2c0b22ab7
The following file scan options will be derived for this node:
-
Any file in any subdirectories of
/etc
that ends in.conf
(rule 1) -
Any file in the directory
/etc
that ends on.conf
(rule 2) -
The specific file
/home/app/shared/server.crt
(rule 3) -
The specific file
/home/app/shared/server.key
(rule 4)
Here, the derived file scan options are simply a union of any scan options inherited from node group membership combined with any specific files listed in associated policies in file-based checks.
Tip: If you are only concerned with files that are included in associated policies you can remove all inherited node group based file scan options and just scan policy relevant files.
What Next?
For more information on file scan options as well as other types of scan options, please visit our guide on Scan Options.